Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Base classes for accelerator refactoring #5715

Merged
merged 22 commits into from
Jan 30, 2021
Merged

Conversation

justusschock
Copy link
Member

@justusschock justusschock commented Jan 30, 2021

What does this PR do?

Adds the base classes from #5616

Before submitting

  • Was this discussed/approved via a GitHub issue? (not for typos and docs)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes? (if necessary)
  • Did you write any new necessary tests? (not for typos and docs)
  • Did you verify new and existing tests pass locally with your changes?
  • Did you update the CHANGELOG? (not for typos, docs, test updates, or internal minor changes/refactorings)

PR review

Anyone in the community is free to review the PR once the tests have passed.
Before you start reviewing make sure you have read Review guidelines. In short, see the following bullet-list:

  • Is this pull request ready for review? (if not, please submit in draft mode)
  • Check that all items from Before submitting are resolved
  • Make sure the title is self-explanatory and the description concisely explains the PR
  • Add labels and milestones (and optionally projects) to the PR so it can be classified
  • Check that target branch and milestone match!

Did you have fun?

Make sure you had fun coding 🙃

Co-Authored with @awaelchli

Co-Authored with @awaelchi
Co-authored with @awaelchi
Co-Authored with @awaelchi
Co-Authored with @awaelchi
Co-authored with @awaelchi
@pep8speaks
Copy link

pep8speaks commented Jan 30, 2021

Hello @justusschock! Thanks for updating this PR.

Line 261:121: E501 line too long (121 > 120 characters)
Line 375:26: W292 no newline at end of file

Line 57:14: W292 no newline at end of file

Line 1:1: W391 blank line at end of file

Comment last updated at 2021-01-30 19:15:52 UTC

@justusschock justusschock added this to the 1.2 milestone Jan 30, 2021
Co-authored-by: @awaelchi
Copy link
Contributor

@awaelchli awaelchli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking forward to it

Copy link
Contributor

@SeanNaren SeanNaren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment on lines +148 to +150
with self.precision_plugin.val_step_context():
with self.training_type_plugin.val_step_context():
return self.lightning_module.validation_step(*args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type of logic makes me think the precision plugin should live in the training type plugin, and that the base training type plugin manage the precision logic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge it as is and then change it later on :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed with @SeanNaren - that cleans up the interface for the accelerator then. conceptually the accelerator simply manages the training plugin with additional lifecycle hooks, and the training plugin is responsible for precision/whatever else(?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I also agree with @SeanNaren and there are plans to support that on top of the existing PoC since then this is probably easier to change then now.

@Borda
Copy link
Member

Borda commented Jan 30, 2021

replacing usage of all unexisting Plugins and annotating with fixme

Copy link
Member

@Borda Borda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all these @abstractmethod with raise NotImplementedError use only one...

  • @abstractmethod cannot be instantiated till all these methods are implemented
  • raise NotImplementedError can be instantiated, but will crash when touching one of these

@codecov
Copy link

codecov bot commented Jan 30, 2021

Codecov Report

Merging #5715 (1c8771a) into release/1.2-dev (21d313e) will decrease coverage by 1%.
The diff coverage is 100%.

@@               Coverage Diff                @@
##           release/1.2-dev   #5715    +/-   ##
================================================
- Coverage               89%     88%    -1%     
================================================
  Files                  168     169     +1     
  Lines                12423   12673   +250     
================================================
+ Hits                 11102   11180    +78     
- Misses                1321    1493   +172     

justusschock and others added 7 commits January 30, 2021 20:09
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
Co-authored-by: Jirka Borovec <Borda@users.noreply.github.com>
@Borda Borda enabled auto-merge (squash) January 30, 2021 19:38
@Borda Borda merged commit 5d239cc into release/1.2-dev Jan 30, 2021
@Borda Borda deleted the ref/base_classes branch January 30, 2021 19:55
Copy link
Contributor

@ananthsub ananthsub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justusschock @awaelchli @SeanNaren what's the expected control flow across trainer/training loop/accelerator/training type plugin? i'm getting mixed up in trying to figure out what will call what when

Comment on lines +148 to +150
with self.precision_plugin.val_step_context():
with self.training_type_plugin.val_step_context():
return self.lightning_module.validation_step(*args)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed with @SeanNaren - that cleans up the interface for the accelerator then. conceptually the accelerator simply manages the training plugin with additional lifecycle hooks, and the training plugin is responsible for precision/whatever else(?)

import torch


class Plugin(object):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are all of these needed for all plugins? are there any steps here that we could/should move to the training type plugin?

@Borda
Copy link
Member

Borda commented Jan 30, 2021

@ananthsub good comments let address them in follow-up PRs :] #5718



class TrainingTypePlugin(Plugin, ABC):
"""A Plugin to change the behaviour of the training, validation and test-loop."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justusschock - another question: currently these training loop functions are set on the accelerator backend in fit() - should these be represented in the interface here or on the accelerator?
What do you think about the training type plugin accepting these as constructor arguments, assuming we can define some base interfaces for train/evaluation/test loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed anymore, since we pass the Trainer to start_training etc, where we can directly call trainer.train

@Borda Borda added the ready PRs ready to be merged label Jan 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready PRs ready to be merged refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants